IN THE CLAIMS 



1. (Currently Amended) A computer-implemented method, comprising: 

identifying a candidate representing a plurality of instructions of a plurality of threads 
that perform one or more external memory accesses, the one or more external 
memory accesses having a substantially an identical base address , including 
partitioning the plurality of instructions of the external memory accesses into 

one or more sets of potential candidates based on dependency 

relationships of the instructions, 
converting addresses of each external memory accesses into a form having a 

base address and an offset, 
grouping multiple potential candidates having substantially an identical base 

address into a single candidate, wherein a group having most of the 

potential candidates is selected as a final candidate for caching, and 
selecting one of the potential candidate sets as the candidate, instructions of the 

candidate satisfying a predetermined dependency relationship ; and 
inserting at least one of directives and instructions into an instruction stream 

corresponding to the identified candidate to maintain contents of at least one of 
a content addressable memory (CAM) and local memory (LM) of a processor 
and to modify at least one of the external memory access accesses to access at 
least one of the CAM and LM of the processor without having to perform the 
respective external memory access. 

2. (Cancelled) 

3. (Cancelled) 
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4. (Currently Amended) The method of claim 3 claim 1 , wherein the base address is a non- 
constant part and the offset is a constant part of the converted address. 

5. (Currently Amended) The method of claim 3, further comprising A computer- 
implemented method, comprising: 

identifying a candidate representing a plurality of instructions of a plurality of threads 
that perform one or more external memory accesses, the one or more external 
memory accesses having an identical base address, including 
partitioning the plurality of instructions of the external memory accesses into 

one or more sets of potential candidates based on dependency 

relationships of the instructions, 
converting addresses of each external memory accesses into a form having a 

base address and an offset, 
screening out one or more ineligible candidates from the potential candidates, 

wherein the ineligible candidates include a base address that is different 

from a remainder of the potential candidates , and 
selecting one of the potential candidate sets as the candidate, instructions of the 

candidate satisfying a predetermined dependency relationship, 
inserting at least one of directives and instructions into an instruction stream 

corresponding to the identified candidate to maintain contents of at least one of 
a content addressable memory (CAM) and local memory (LM) of a processor 
and to modify at least one of the external memory accesses to access at least 
one of the CAM and LM of the processor without having to perform the 
respective external memory access . 

6. (Cancelled) 
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7. (Currently Amended) The method of claim 1, wherein the identifying the candidate 
further comprises: 

performing a copy-forward transformation on addresses of each of the external 

memory accesses; and 
performing at least one of a global value numbering operation and a constant folding 

operation for each thread. 

8, (Currently Amended) The method of claim 3, further comprising: A computer- 
implemented method, comprising: 

identifying a candidate representing a plurality of instructions of a plurality of threads 
that perform one or more external memory accesses, the one or more external 
memory accesses having an identical base address, including 
partitioning the plurality of instructions of the external memory accesses into 

one or more sets of potential candidates based on dependency 

relationships of the instructions, 
converting addresses of each external memory accesses into a form having a 

base address and an offset, and 
selecting one of the potential candidate sets as the candidate, instructions of the 

candidate satisfying a predetermined dependency relationship: and 
inserting at least one of directives and instructions into an instruction stream 

corresponding to the identified candidate to maintain contents of at least one of 
a content addressable memory (CAM) and local memory (LM) of a processor 
and to modify at least one of the external memory accesses to access at least 
one of the CAM and LM of the processor without having to perform the 
respective external memory access, including 

for each thread, reserving a sufficient space in the local memory to store data 
portions of cache linesr-an d, and 
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inserting a caching instruction prior to each of the external memory accesses. 

9. (Original) The method of claim 8, further comprising seeking the base address of each 
external memory access in the CAM to determine whether the CAM includes an entry 
that contains the base address being sought. 

10. (Original) The method of claim 9, wherein if the CAM includes an entry containing the 
base address being sought, the method further comprises: 

determining an offset of the local memory based on the entry of the CAM containing 

the base address being sought; and 
accessing data from an entry of the local memory referenced by the determined offset. 

11. (Original) The method of claim 9, wherein if the CAM does not includes an entry 
containing the base address being sought, the method further comprises allocating a least 
recently used (LRU) entry of the CAM having a base address of a previous external 
memory access. 

12. (Original) The method of claim 11, further comprising: 

loading data of a current external memory access from the external memory into an 
entry of the local memory referenced by the allocated LRU entry; and 

storing the base address of the current external memory access in the LRU entry of the 
CAM replacing the base address of the previous external memory access. 

13. (Original) The method of claim 11, further comprising: 

examining the base address of the previous external memory access in the allocated 
LRU entry to determine whether the base address is valid; and 
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replicating data of an entry in the local memory corresponding to the allocated LRU 
entry to a location of the external memory based address of the previous 
external memory access. 

14. (Currently Amended) A machine-readable storage medium having executable code to 
cause a machine to perform a method, the method comprising: 
identifying a candidate representing a plurality of instructions of a plurality of threads 
that perform one or more external memory accesses, the one or more external 
memory accesses having an identical base address, including 
partitioning the plurality of instructions of the external memory accesses into 

one or more sets of potential candidates based on dependency 

relationships of the instructions, 
converting addresses of each external memory accesses into a form having a 

base address and an offset, 
grouping multiple potential candidates having substantially an identical base 

address into a single candidate, wherein a group having most of the 

potential candidates is selected as a final candidate for caching, and 
selecting one of the potential candidate sets as the candidate, instructions of the 

candidate satisfying a predetermined dependency relationship; and 
inserting at least one of directives and instructions into an instruction stream 

corresponding to the identified candidate to maintain contents of at least one of 
a content addressable memory (CAM) and local memory (LM) of a processor 
and to modify at least one of the external memory accesses to access at least 
one of the CAM and LM of the processor without having to perform the 
respective external memory access. 
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identifying a candidate representing a plurality of instructions of a plurality of threads 
that perform one or more external memory accesses, the external memory 
accesses having a substantially identical base address; and 

inserting at least one of directives and instructions into an instruction stream 

corresponding to the identified candidate to maintain contents of at least one of 
a content addressable memory (CAM) and local memory (LM) of a processor 
and to modify at least one of the external memory access to access at least one 
of the CAM and LM of the processor without having to perform the respective 
external memory access. 

15. (Cancelled) 

16. (Cancelled) 

17. (Currently Amended) The machine readable medium of claim 15, wherein the method 
further comprises screening out one or more ineligible candidates from the potential 
candidates, wherein the ineligible candidates include a base address that is different from 
a remainder of the potential candidates. A machine-readable storage medium having 
executable code to cause a machine to perform a method, the method comprising: 

identifying a candidate representing a plurality of instructions of a plurality of threads 
that perform one or more external memory accesses, the one or more external 
memory accesses having an identical base address, including 
partitioning the plurality of instructions of the external memory accesses into 

one or more sets of potential candidates based on dependency 

relationships of the instructions, 
converting addresses of each external memory accesses into a form having a 

base address and an offset, 
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screening out one or more ineligible candidates from the potential candidates, 
wherein the ineligible candidates include a base address that is different 
from a remainder of the potential candidates, and 
selecting one of the potential candidate sets as the candidate, instructions of the 
candidate satisfying a predetermined dependency relationship, 
inserting at least one of directives and instructions into an instruction stream 

corresponding to the identified candidate to maintain contents of at least one of 
a content addressable memory (CAM) and local memory (LM) of a processor 
and to modify at least one of the external memory accesses to access at least 
one of the CAM and LM of the processor without having to perform the 
respective external memory access. 

18. (Cancelled) 

19. (Currently Amended) The machine readable medium of claim 15, wherein the method 
further comprises: A machine-readable storage medium having executable code to cause 
a machine to perform a method, the method comprising: 

identifying a candidate representing a plurality of instructions of a plurality of threads 
that perform one or more external memory accesses, the one or more external 
memory accesses having an identical base address, including 
partitioning the plurality of instructions of the external memory accesses into 

one or more sets of potential candidates based on dependency 

relationships of the instructions, 
converting addresses of each external memory accesses into a form having a 

base address and an offset, and 
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selecting one of the potential candidate sets as the candidate, instructions of the 
candidate satisfying a predetermined dependency relationship; and 
inserting at least one of directives and instructions into an instruction stream 

corresponding to the identified candidate to maintain contents of at least one of 
a content addressable memory (CAM) and local memory (LM) of a processor 
and to modify at least one of the external memory accesses to access at least 
one of the CAM and LM of the processor without having to perform the 
respective external memory access, including 

for each thread, reserving a sufficient space in the local memory to store data 

portions of cache lines, and 
inserting a caching instruction prior to each of the external memory accesses. 
for each thread, reserving a sufficient space in the local memory to store data portions 
of cache lines; and 

inserting a caching instruction prior to each of the external memory accesses. 

20. (Currently Amended) The machine-readable storage medium of claim 19, wherein the 
method further comprises seeking the base address of each external memory access in 
the CAM to determine whether the CAM includes an entry that contains the base 
address being sought. 

21. (Currently Amended) The machine-readable storage medium of claim 20 wherein if the 
CAM includes an entry containing the base address being sought, the method further 
comprises: 

determining an offset of the local memory based on the entry of the CAM containing 

the base address being sought; and 
accessing data from an entry of the local memory referenced by the determined offset. 
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22. (Currently Amended) The machine-readable storage medium of claim 20, wherein if the 
CAM does not includes an entry containing the base address being sought, the method 
further comprises allocating a least recently used (LRU) entry of the CAM having a base 
address of a previous external memory access. 

23. (Currently Amended) The machine-readable storage medium of claim 22, wherein the 
method further comprises: 

loading data of a current external memory access from the external memory into an 
entry of the local memory referenced by the allocated LRU entry; and 

storing the base address of the current external memory access in the LRU entry of the 
CAM replacing the base address of the previous external memory access. 

24. (Currently Amended) The machine-readable storage medium of claim 22, wherein the 
method further comprises: 

examining the base address of the previous external memory access in the allocated 
LRU entry to determine whether the base address is valid; and 

replicating data of an entry in the local memory corresponding to the allocated LRU 
entry to a location of the external memory based address of the previous 
external memory access. 

25. - 30. (Cancelled). 

31. (New) A data processing system, comprising: 
a processor; and 

a memory for storing instructions, which when executed from the memory, cause the 
processor to perform operations, including 

identifying a candidate representing a plurality of instructions of a plurality of 
threads that perform one or more external memory accesses, the one or 
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more external memory accesses having an identical base address, 
including 

partitioning the plurality of instructions of the external memory 

accesses into one or more sets of potential candidates based on 
dependency relationships of the instructions, 
converting addresses of each external memory accesses into a form 

having a base address and an offset, 
grouping multiple potential candidates having substantially an identical 
base address into a single candidate, wherein a group having 
most of the potential candidates is selected as a final candidate 
for caching, and 
selecting one of the potential candidate sets as the candidate, 
instructions of the candidate satisfying a predetermined 
dependency relationship; and 
inserting at least one of directives and instructions into an instruction stream 
corresponding to the identified candidate to maintain contents of at 
least one of a content addressable memory (CAM) and local memory 
(LM) of a processor and to modify at least one of the external memory 
accesses to access at least one of the CAM and LM of the processor 
without having to perform the respective external memory access. 

32. (New) A data processing system, comprising: 
a processor; and 

a memory for storing instructions, which when executed from the memory, cause the 
processor to perform operations, including 

identifying a candidate representing a plurality of instructions of a plurality of 
threads that perform one or more external memory accesses, the one or 
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more external memory accesses having an identical base address, 
including 

partitioning the plurality of instructions of the external memory 

accesses into one or more sets of potential candidates based on 
dependency relationships of the instructions, 
converting addresses of each external memory accesses into a form 

having a base address and an offset, 
screening out one or more ineligible candidates from the potential 
candidates, wherein the ineligible candidates include a base 
address that is different from a remainder of the potential 
candidates, and 
selecting one of the potential candidate sets as the candidate, 
instructions of the candidate satisfying a predetermined 
dependency relationship, 
inserting at least one of directives and instructions into an instruction stream 
corresponding to the identified candidate to maintain contents of at 
least one of a content addressable memory (CAM) and local memory 
(LM) of a processor and to modify at least one of the external memory 
accesses to access at least one of the CAM and LM of the processor 
without having to perform the respective external memory access. 

33. (New) A data processing system, comprising: 
a processor; and 

a memory for storing instructions, which when executed from the memory, cause the 
processor to perform operations, including 

identifying a candidate representing a plurality of instructions of a plurality of 
threads that perform one or more external memory accesses, the one or 
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more external memory accesses having an identical base address, 
including 

partitioning the plurality of instructions of the external memory 

accesses into one or more sets of potential candidates based on 

dependency relationships of the instructions, 
converting addresses of each external memory accesses into a form 

having a base address and an offset, and 
selecting one of the potential candidate sets as the candidate, 

instructions of the candidate satisfying a predetermined 

dependency relationship; and 
inserting at least one of directives and instructions into an instruction stream 
corresponding to the identified candidate to maintain contents of at 
least one of a content addressable memory (CAM) and local memory 
(LM) of a processor and to modify at least one of the external memory 
accesses to access at least one of the CAM and LM of the processor 
without having to perform the respective external memory access, 
including 

for each thread, reserving a sufficient space in the local memory to store 

data portions of cache lines, and 
inserting a caching instruction prior to each of the external memory 

accesses. 
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